Housing Valuation is an area in which statistical models can play a role. The models which are frequently used can also be used to model other price structures. The project is concerned with finding the most reliable determinants on property prices. The dataset is a subset of anonymised mortgage records for the area that is known as Greater London. The purchase price (which is different from the asking price) is available, as a a series of the characteristics of the property. The goal is to find the best group of predictors of property price and to find the most reliable determinants on property prices.
Describing the methods for property price prediction
Obtain significant predictor variables in predicting prices of housing in London
Obtain an estimate of the spatial variation in the influence of floorspace change on price by borough.
Geographically weighted regression (GWR) is a unique type of regression. Compared to a linear regression, the predictors contributes a coefficient value which tells how much the response is changed based on a unit change in the predicting variable. Whereas in GWR, the coefficient value changes based on spatial orientation. A coefficient value is no longer global and is calculated based on that specific region. This will decrease bias and give out a more intriguing and accurate response and analysis,
## Observations: 12,536
## Variables: 31
## $ X <int> 53, 73, 78, 95, 125, 153, 182, 189, 203, 207, 215, 21...
## $ Easting <int> 545500, 525000, 531100, 538500, 534000, 528700, 53490...
## $ Northing <int> 173000, 177800, 183400, 169400, 168400, 168800, 18700...
## $ Purprice <int> 85000, 71000, 60000, 64000, 260000, 48500, 34500, 559...
## $ BldIntWr <int> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ BldPostW <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,...
## $ Bld60s <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0,...
## $ Bld70s <int> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Bld80s <int> 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0,...
## $ TypDetch <int> 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1,...
## $ TypSemiD <int> 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0,...
## $ TypFlat <int> 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0,...
## $ GarSingl <int> 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1,...
## $ GarDoubl <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Tenfree <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1,...
## $ CenHeat <int> 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1,...
## $ BathTwo <int> 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ BedTwo <int> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ BedThree <int> 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1,...
## $ BedFour <int> 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,...
## $ BedFive <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ NewPropD <int> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0,...
## $ FlorArea <dbl> 76.16146, 98.45262, 124.73761, 127.00000, 190.40366, ...
## $ NoCarHh <dbl> 50.2793, 14.6342, 36.4162, 17.8082, 7.4074, 14.0187, ...
## $ CarspP <dbl> 25.2451, 46.8865, 37.8049, 47.5936, 67.6966, 49.5512,...
## $ ProfPct <dbl> 0.0000, 6.2500, 0.0000, 0.0000, 9.0909, 16.6667, 0.00...
## $ UnskPct <dbl> 11.1111, 0.0000, 11.1111, 0.0000, 0.0000, 8.3333, 0.0...
## $ RetiPct <dbl> 88.8889, 12.5000, 77.7778, 75.0000, 36.3636, 50.0000,...
## $ Saleunem <dbl> 19.2308, 5.3571, 5.2632, 8.8235, 3.6765, 3.0769, 9.74...
## $ Unemploy <dbl> 85.53494, 32.82623, 31.61733, 0.12889, 21.88766, 31.6...
## $ PopnDnsy <dbl> 11.48515, 8.29268, 7.81671, 18.18182, 8.22222, 3.7523...
Convert dummies to factors - more convenient for modelling.
For building model to predict the price of property in London, some variables shoud be organized properly.
Age: these represent the time period in which the property was constructed. It is from variables BldIntWr,BldPostW,Bld60s,Bld70s and Bld80s. The values of it are PreWW1, BldIntWr, BldPostW, Bld60s, Bld70s and Bld80s.
Type: these represent the type of building. It is from variables TypDetch,TypSemiD and TypFlat. The values of it are TypDetch, TypSemiD, TypFlat and Bungalow.
Garage: these represent the numbers of garage that the property has. It is from variables GarSingl and GarDoubl. The values of it are HardStnd, GarSingl and GarDoubl.
Bedrooms: these represent the numbers of bedrooms that the property has. It is from variables BedTwo,BedThree,BedFour and BedFive. The values of it are BedOne, BedTwo, BedThree, BedFour and BedFive.
## Observations: 12,536
## Variables: 20
## $ Easting <int> 545500, 525000, 531100, 538500, 534000, 528700, 53490...
## $ Northing <int> 173000, 177800, 183400, 169400, 168400, 168800, 18700...
## $ Purprice <int> 85000, 71000, 60000, 64000, 260000, 48500, 34500, 559...
## $ Tenfree <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1,...
## $ CenHeat <fct> 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1,...
## $ BathTwo <fct> 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ NewPropD <fct> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0,...
## $ FlorArea <dbl> 76.16146, 98.45262, 124.73761, 127.00000, 190.40366, ...
## $ ProfPct <dbl> 0.0000, 6.2500, 0.0000, 0.0000, 9.0909, 16.6667, 0.00...
## $ Age <fct> Bld60s, Bld80s, PreWW1, Bld80s, Bld80s, PreWW1, Bld70...
## $ Type <fct> TypDetch, TypDetch, TypSemiD, TypDetch, TypDetch, Typ...
## $ Garage <fct> GarSingl, GarSingl, HardStnd, GarSingl, GarDoubl, Har...
## $ Bedrooms <fct> BedThree, BedThree, BedFour, BedThree, BedFour, BedTh...
## $ NoCarHh <dbl> 50.2793, 14.6342, 36.4162, 17.8082, 7.4074, 14.0187, ...
## $ CarspP <dbl> 25.2451, 46.8865, 37.8049, 47.5936, 67.6966, 49.5512,...
## $ UnskPct <dbl> 11.1111, 0.0000, 11.1111, 0.0000, 0.0000, 8.3333, 0.0...
## $ RetiPct <dbl> 88.8889, 12.5000, 77.7778, 75.0000, 36.3636, 50.0000,...
## $ Saleunem <dbl> 19.2308, 5.3571, 5.2632, 8.8235, 3.6765, 3.0769, 9.74...
## $ Unemploy <dbl> 85.53494, 32.82623, 31.61733, 0.12889, 21.88766, 31.6...
## $ PopnDnsy <dbl> 11.48515, 8.29268, 7.81671, 18.18182, 8.22222, 3.7523...
The Purprice variable.
The price of most property is under 600,000, but there is a outlier, which is much bigger than others. It would influence the result for the reslut of analysis.
Delete the outlier which is over 600,000.
Plot the Purprice versus FlorArea.
The floor Area and price shows a somewhat linear relationship. The slope is constant and no clear curvature is present. The price increases as floor area increases.
Plot the Purprice versus ProfPct.
Since Profpct only takes on a few values, a linear relationship is inadequate.
Plot the Purprice versus NoCarHh.
Plot the Purprice versus CarspP.
Plot the Purprice versus UnskPct.
Plot the Purprice versus RetiPct.
Plot the Purprice versus Saleunem.
Plot the Purprice versus Unemploy.
Plot the Purprice versus PopnDnsy.
Based on the plots above, all these variables do not have string linear relationship whith dependent variable property price. For cars per person in neighborhood and proportion of households with unskilled head, the fit line almost horizontal. It means They have no liner relationship with property price. Others variables are scattered around the orgin, and most of the points are scattered tightly around the orgin. The trand of lines are mainly infuenced by outliers. The same conclusion we can get that the relationship between them and price of property are very weak.
Plot the Purprice versus CenHeat.
It show that houses with central heating are higher priced than houses without central heating. Although the average price of houses with central heating is higher, it does not differ by a large price difference. It is more comfortable when heating is provided 24/7 as to heating which needs to be set up before using which could cause discomfort in some cases.
Plot the Purprice versus Garage.
From the number of garages, we can clearly see that the houses with two garage’s median price is a lot higher than houses with single garage. Again, the size of the house is influenced by how many cars the garage can park. By assumption, one wouldn’t have two garages with a single room. It would only be available to houses with more than two rooms to have two garages.
Plot the Purprice versus BathTwo.
Furthermore, we can see that houses with two bathrooms is also higher priced on average. This difference between one bathroom to two bathrooms is much higher. Intuitively, this would be more convenient and houses with more than one washroom are typically bigger in size based on the design of the interior.
Plot the Purprice versus Bedrooms.
Finally moving on to the number of bedrooms a house would have. We can see that the houses with one room and two room does not differ by much. Even three rooms doesn’t have too much difference in the median of pricing. However, as the bedroom goes to four or even five, the jump is significantly higher.
Plot the Purprice versus Age.
Moving on the to the next predictor, we have the age of the house. From our plots, we can see that housing before the World War 1 has greatest span of pricing. It is usually because the location of the housing was excellent since it was just the beginning. Therefore, it could be one of the reasons to explain the span of prices.
The type of the house also influences the pricing of housing. For example, we had detached homes, semidetached and flats. Obviously detached homes would have the highest pricing, as it has more privacy and the layout of the houses are better. Then we have the semidetached, which is still good. However, it does lack the same amount of privacy from a full detached house. Flats would be at the end of the list since there is little privacy if the isolation was not done well.
In the plots above, we can see that the types of property is a important factor that influence the price of a property. The property with central heating tend to be more expensive. As the number of garages, bathrooms and bedrooms goes up, the price of property shows a increase trend. However, the age of proerty seems have no influence for the price of property.Large houses clearly costs more, however as the size of the houses goes up, there are few data available. As we can see from our PurPrice vs FloorArea plot, the left size is tightly scattered with data and the right side of the line has a lot fewer data.
With all the predictors examined, we move to our simple linear regression model. We first use lm() function in R for our models.
model.9v <-lm(Purprice~FlorArea+Bedrooms+Type+BathTwo+Garage+Tenfree+CenHeat+Age+ProfPct+NewPropD+NoCarHh+CarspP+UnskPct+RetiPct+Saleunem+Unemploy+PopnDnsy,data=MyData)
If we were to write out the function, it would be :
Purprice = b0 + b1FlorArea + b2Bedrroms + b3Type +…+ b17PopnDnsy
Our predictors would be able to predict the price of a house based on given London data. It would be able to predict the price based on the coefficients of the predictors. It is only required to have the right input in order to predict the price. Then we want to find the predictor that has the most impact on price. So, we used AIC to compare the different predictors. Then fit model with all predictors and choose significant predictors for linear model. Finally, fit model with significan predictors and check VIF of predictors to avoid colinearity.
In order to choose significant variables for model, we build model for response and every predictor respectively and output the AICs of models in the table above. We can see that the area of floor is the most important predictor for predicting the price of properties. The number of bedrooms, bathrooms and the property type are also impact the property price greatly.
##
## Call:
## lm(formula = Purprice ~ ., data = MyData[, 3:20])
##
## Residuals:
## Min 1Q Median 3Q Max
## -136420 -13483 -1322 10340 371624
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12216.791 3234.459 3.777 0.000159 ***
## Tenfree1 6215.453 1352.354 4.596 4.35e-06 ***
## CenHeat1 11856.974 754.564 15.714 < 2e-16 ***
## BathTwo1 24029.638 1202.984 19.975 < 2e-16 ***
## NewPropD1 1879.988 1545.371 1.217 0.223807
## FlorArea 677.818 11.379 59.570 < 2e-16 ***
## ProfPct 45.261 24.909 1.817 0.069235 .
## AgeBldIntWr 3996.542 657.059 6.082 1.22e-09 ***
## AgeBldPostW -1134.041 975.441 -1.163 0.245017
## AgeBld60s -7318.654 1089.956 -6.715 1.97e-11 ***
## AgeBld70s -6713.012 1164.455 -5.765 8.36e-09 ***
## AgeBld80s 357.120 1026.580 0.348 0.727941
## TypeTypSemiD -12406.759 1005.914 -12.334 < 2e-16 ***
## TypeTypFlat -17328.944 1032.138 -16.789 < 2e-16 ***
## TypeBungalow -5754.224 1658.808 -3.469 0.000524 ***
## GarageGarSingl 3773.963 614.680 6.140 8.52e-10 ***
## GarageGarDoubl 9279.791 1676.432 5.535 3.17e-08 ***
## BedroomsBedTwo -3399.740 869.034 -3.912 9.20e-05 ***
## BedroomsBedThree -7863.395 1068.092 -7.362 1.92e-13 ***
## BedroomsBedFour -1709.352 1542.268 -1.108 0.267738
## BedroomsBedFive 3973.657 2504.121 1.587 0.112573
## NoCarHh -12.783 30.913 -0.414 0.679247
## CarspP -19.105 40.555 -0.471 0.637592
## UnskPct -40.730 36.679 -1.110 0.266830
## RetiPct -7.091 5.618 -1.262 0.206928
## Saleunem 54.662 60.992 0.896 0.370154
## Unemploy 9.737 5.808 1.677 0.093629 .
## PopnDnsy 42.660 40.423 1.055 0.291287
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 27150 on 12507 degrees of freedom
## Multiple R-squared: 0.5652, Adjusted R-squared: 0.5643
## F-statistic: 602.2 on 27 and 12507 DF, p-value: < 2.2e-16
Then fit linear model with all predictors. The output of model shows the proportion of households without a car, cars per person in neighborhood, proportion of households with professional head, proportion of households with unskilled head, proportion of residents retired,unemployed workers,the new properties and local population density are not significant. This conclusion the the same as what we get in the corrolation coefficient table. So these variables are moved out from model.
##
## Call:
## lm(formula = Purprice ~ Tenfree + CenHeat + BathTwo + FlorArea +
## Age + Type + Garage + Bedrooms, data = MyData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -135607 -13414 -1328 10382 371167
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12225.50 2113.53 5.784 7.45e-09 ***
## Tenfree1 6140.30 1352.02 4.542 5.64e-06 ***
## CenHeat1 11854.63 754.65 15.709 < 2e-16 ***
## BathTwo1 24053.12 1202.55 20.002 < 2e-16 ***
## FlorArea 678.08 11.37 59.613 < 2e-16 ***
## AgeBldIntWr 4051.07 656.87 6.167 7.16e-10 ***
## AgeBldPostW -1136.03 975.15 -1.165 0.244050
## AgeBld60s -7336.41 1089.76 -6.732 1.75e-11 ***
## AgeBld70s -6707.95 1164.43 -5.761 8.57e-09 ***
## AgeBld80s 935.35 898.71 1.041 0.298003
## TypeTypSemiD -12381.18 1005.73 -12.311 < 2e-16 ***
## TypeTypFlat -17269.10 1031.65 -16.739 < 2e-16 ***
## TypeBungalow -5695.79 1658.13 -3.435 0.000594 ***
## GarageGarSingl 3776.80 614.47 6.146 8.16e-10 ***
## GarageGarDoubl 9257.72 1676.29 5.523 3.40e-08 ***
## BedroomsBedTwo -3450.07 869.10 -3.970 7.24e-05 ***
## BedroomsBedThree -7869.64 1068.21 -7.367 1.85e-13 ***
## BedroomsBedFour -1743.82 1541.90 -1.131 0.258095
## BedroomsBedFive 3937.18 2504.29 1.572 0.115935
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 27150 on 12516 degrees of freedom
## Multiple R-squared: 0.5647, Adjusted R-squared: 0.564
## F-statistic: 901.9 on 18 and 12516 DF, p-value: < 2.2e-16
## Tenfree CenHeat BathTwo FlorArea Age Type Garage Bedrooms
## 6.722 1.030 1.253 3.032 1.561 9.769 1.480 4.112
Buiding model with all significant predictors and check colinearity by VIF. In the table above, the colinearity of property type is very high(9.769). It shoud be moved out form model. In the next step, the dataset would be seperate into training and testing data and the linear model would be built using training dataset and be tested using testing dataset.
##
## Call:
## lm(formula = Purprice ~ FlorArea + Bedrooms + BathTwo + Garage +
## Tenfree + CenHeat + Age, data = trainData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -131677 -13606 -1649 10507 364562
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2386.22 1529.39 1.560 0.118744
## FlorArea 715.26 14.97 47.766 < 2e-16 ***
## BedroomsBedTwo -5323.44 1148.02 -4.637 3.59e-06 ***
## BedroomsBedThree -10485.73 1398.61 -7.497 7.26e-14 ***
## BedroomsBedFour -2291.56 2049.23 -1.118 0.263493
## BedroomsBedFive 5954.67 3305.12 1.802 0.071640 .
## BathTwo1 24198.54 1585.61 15.261 < 2e-16 ***
## GarageGarSingl 6137.83 793.09 7.739 1.13e-14 ***
## GarageGarDoubl 15060.80 2180.87 6.906 5.40e-12 ***
## Tenfree1 -1618.72 939.73 -1.723 0.085013 .
## CenHeat1 12268.51 1000.50 12.262 < 2e-16 ***
## AgeBldIntWr 5663.90 858.19 6.600 4.40e-11 ***
## AgeBldPostW 2312.66 1279.79 1.807 0.070791 .
## AgeBld60s -5397.07 1436.10 -3.758 0.000172 ***
## AgeBld70s -5712.37 1522.99 -3.751 0.000178 ***
## AgeBld80s 3592.95 1174.30 3.060 0.002224 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 27920 on 7505 degrees of freedom
## Multiple R-squared: 0.5553, Adjusted R-squared: 0.5544
## F-statistic: 624.8 on 15 and 7505 DF, p-value: < 2.2e-16
## [1] 777827779
## [1] 723418875
As the output of the model above, the mean square error of testing dataset is 777827779 which is slightly lower than that of training dataset. For predictor floor area, 1 square metre increase, the average price of property would increase 715.26 GBP, keeping other predictors constant. The average price for those properties with central hesting is higer than those without central heating by 12268.51 GBP, keeping other predictors constant.
##
## Call:
## lm(formula = Purprice ~ x + y + I(x^2) + I(y^2) + I(x * y), data = MyData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -73924 -24782 -9828 9862 444261
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.153e+06 8.741e+05 -3.607 0.000311 ***
## x 1.225e+04 2.793e+03 4.387 1.16e-05 ***
## y 3.525e+02 2.916e+03 0.121 0.903766
## I(x^2) -1.074e+01 2.555e+00 -4.203 2.66e-05 ***
## I(y^2) 7.372e+00 4.717e+00 1.563 0.118080
## I(x * y) -5.727e+00 4.323e+00 -1.325 0.185350
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 41050 on 12529 degrees of freedom
## Multiple R-squared: 0.004172, Adjusted R-squared: 0.003774
## F-statistic: 10.5 on 5 and 12529 DF, p-value: 4.507e-10
Fit model with variable Easting and Westing to test is the location influencing the price of properties significantly. The result shows that the properties tend to have a lower price as we move east and the infulence is significant. So it is necessary to consider the geographic effect in predicting the price of propertities.
## OGR data source with driver: ESRI Shapefile
## Source: "E:\AcademicYear\semester2\3_CaseStudies\project\git\GY683", layer: "LondonBoroughs"
## with 33 features
## It has 15 fields
## Integer64 fields read as strings: NUMBER NUMBER0 POLYGON_ID UNIT_ID
Property Price Versus Borough
In the plot above, We can see the median of property price is diffrent in different borough in London. Expecially in the city of London, property price is significantly higher than that in other boroughs.
Standardsed Residuals Versus Borough
In the borough versus standard resudial plot, we can get the same conclusion that the distributioan of residuals in different boroughs are different. If we can fit model cansidering the effect from boroughs, the result might be better.we will now run a geographically weighted regression model to see how the coefficients of the model might vary across London.
First we will calibrate the bandwidth of the kernel that will be used to capture the points for each regression (this may take a little while) and then run the model:
## ***********************************************************************
## * Package GWmodel *
## ***********************************************************************
## Program starts at: 2020-05-10 00:06:28
## Call:
## gwr.basic(formula = Purprice ~ FlorArea + Bedrooms + BathTwo +
## Garage + Tenfree + CenHeat + Age, data = map, bw = bw, kernel = "gaussian")
##
## Dependent (y) variable: Purprice
## Independent variables: FlorArea Bedrooms BathTwo Garage Tenfree CenHeat Age
## Number of data points: 7521
## ***********************************************************************
## * Results of Global Regression *
## ***********************************************************************
##
## Call:
## lm(formula = formula, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -131677 -13606 -1649 10507 364562
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2386.22 1529.39 1.560 0.118744
## FlorArea 715.26 14.97 47.766 < 2e-16 ***
## BedroomsBedTwo -5323.44 1148.02 -4.637 3.59e-06 ***
## BedroomsBedThree -10485.73 1398.61 -7.497 7.26e-14 ***
## BedroomsBedFour -2291.56 2049.23 -1.118 0.263493
## BedroomsBedFive 5954.67 3305.12 1.802 0.071640 .
## BathTwo1 24198.54 1585.61 15.261 < 2e-16 ***
## GarageGarSingl 6137.83 793.09 7.739 1.13e-14 ***
## GarageGarDoubl 15060.80 2180.87 6.906 5.40e-12 ***
## Tenfree1 -1618.72 939.73 -1.723 0.085013 .
## CenHeat1 12268.51 1000.50 12.262 < 2e-16 ***
## AgeBldIntWr 5663.90 858.19 6.600 4.40e-11 ***
## AgeBldPostW 2312.66 1279.79 1.807 0.070791 .
## AgeBld60s -5397.07 1436.10 -3.758 0.000172 ***
## AgeBld70s -5712.37 1522.99 -3.751 0.000178 ***
## AgeBld80s 3592.95 1174.30 3.060 0.002224 **
##
## ---Significance stars
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 27920 on 7505 degrees of freedom
## Multiple R-squared: 0.5553
## Adjusted R-squared: 0.5544
## F-statistic: 624.8 on 15 and 7505 DF, p-value: < 2.2e-16
## ***Extra Diagnostic information
## Residual sum of squares: 5.850043e+12
## Sigma(hat): 27893.27
## AIC: 175347.7
## AICc: 175347.8
## ***********************************************************************
## * Results of Geographically Weighted Regression *
## ***********************************************************************
##
## *********************Model calibration information*********************
## Kernel function: gaussian
## Fixed bandwidth: 6103.785
## Regression points: the same locations as observations are used.
## Distance metric: Euclidean distance metric is used.
##
## ****************Summary of GWR coefficient estimates:******************
## Min. 1st Qu. Median 3rd Qu. Max.
## Intercept -11554.796 254.395 4904.294 7410.595 11034.18
## FlorArea 612.043 676.780 707.772 732.984 821.42
## BedroomsBedTwo -12787.458 -6204.371 -4945.664 -3734.715 -1174.21
## BedroomsBedThree -20027.410 -12955.621 -10648.990 -7608.165 -3183.76
## BedroomsBedFour -16583.906 -7625.159 -3919.857 3315.564 9841.25
## BedroomsBedFive -27505.494 -11882.768 8012.577 16527.424 80111.69
## BathTwo1 8647.801 21953.435 25329.002 29390.062 39318.85
## GarageGarSingl 2231.026 4538.113 5867.882 8153.808 11417.84
## GarageGarDoubl 5302.221 12128.844 15679.468 18372.378 28602.52
## Tenfree1 -7362.917 -3018.066 -1110.351 655.130 4225.97
## CenHeat1 6789.054 10087.689 12008.319 13652.972 17671.62
## AgeBldIntWr 66.841 1648.497 3842.940 8534.553 13274.22
## AgeBldPostW -4929.118 -2849.870 -355.913 5982.302 14275.50
## AgeBld60s -13142.216 -8457.267 -5520.874 -2885.553 1271.68
## AgeBld70s -11356.027 -9566.743 -7285.795 -4238.382 3812.07
## AgeBld80s -2849.667 -290.098 2618.808 5548.411 13068.94
## ************************Diagnostic information*************************
## Number of data points: 7521
## Effective number of parameters (2trace(S) - trace(S'S)): 182.7964
## Effective degrees of freedom (n-2trace(S) + trace(S'S)): 7338.204
## AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 175082.7
## AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 174949
## Residual sum of squares: 5.479639e+12
## R-square value: 0.5834657
## Adjusted R-square value: 0.5730883
##
## ***********************************************************************
## Program stops at: 2020-05-10 00:07:38
The output from the GWR model reveals how the coefficients vary across the 33 boroughs in London. You will see how the global coefficients are exactly the same as the coefficients in the earlier linear model. In this particular model, if we take area of floor , we can see that the coefficients range from a minimum value of 612.043 GBP(1 square metre change in area of floor resulting in a increase in average price of property of 612.043 GBP) to 821.42 GBP(1 square metre change in area of floor resulting in an increase in average price of property of 821.42 GBP). For half of the boroughs in the dataset, as floor area rises by 1 point, price of property will increase between 676.780 GBP and 732.984 GBP(the inter-quartile range between the 1st Qu and the 3rd Qu).
Coefficient ranges can also be seen for the other variables and they suggest some interesting spatial patterning. To explore this we can plot the GWR coefficients for different variables. Firstly we can attach the coefficients to our original dataframe - this can be achieved simply as the coefficients for each ward appear in the same order in our spatial points dataframe as they do in the original dataframe.
Taking the first plot, which is for the area of floor coefficients. We can see that in the boroughs north of the city center, there is the highest change of property price corresponding to 1 square metre increase. However, in the boroughs south of the city center, the lowest change of property price corresponding to 1 square metre increase. This is a very interesting pattern, but may partly be explained the in the boroughs north of the city center, the buyers value the area of floor much, which makes the area of floor influencing the price of property much.
The second plot is for central heating. In the west and east part of London, hasing a central heating can only influence by less than 10,000 GBP. For those boroughs in the north and south of city center, the propertity whit a center heating is much more important, the price can increase by 12,500 to 17,500 conpared with those without central heating.
For other predictors in the model, the similar effect can also be see. They have the defferent coefficients in diffreren boroughs.
In conclusion, the most reliable determinants on property prices are area of floor,the number of bedrooms, having more than two bothrooms, the number of garage, with central heating, Leasehold/Freehold indicator and the age of properties. Although,the global model with these predictors can get a good result for predicting the price of properties, it dose not consider the spatial component. It is proved that GWR is a better way to estimate the price of property.